Improved biclustering of microarray data demonstrated through systematic performance tests
نویسندگان
چکیده
A new algorithm is presented for 4tting the plaid model, a biclustering method developed for clustering gene expression data. The approach is based on speedy individual di6erences clustering and uses binary least squares to update the cluster membership parameters, making use of the binary constraints on these parameters and simplifying the other parameter updates. The performance of both algorithms is tested on simulated data sets designed to imitate (normalised) gene expression data, covering a range of biclustering con4gurations. Empirical distributions for the components of these data sets, including non-systematic error, are derived from a real set of microarray data. A set of two-way quality measures is proposed, based on one-way measures commonly used in information retrieval, to evaluate the quality of a retrieved bicluster with respect to a target bicluster in terms of both genes and samples. By de4ning a one-to-one correspondence between target biclusters and retrieved biclusters, the performance of each algorithm can be assessed. The results show that, using appropriately selected starting criteria, the proposed algorithm out-performs the original plaid model algorithm across a range of data sets. Furthermore, through the rigorous assessment of the plaid model a benchmark for future evaluation of biclustering methods is established. c © 2004 Elsevier B.V. All rights reserved.
منابع مشابه
Query-based Biclustering using Formal Concept Analysis
Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate prior knowledge and expertise from the user. To this end, query-based biclustering algorithms that are rece...
متن کاملBiclustering with Background Knowledge using Formal Concept Analysis
Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate background knowledge and expertise from the user. To this end, query-based biclustering algorithms have bee...
متن کاملA memetic algorithm for discovering negative correlation biclusters of DNA microarray data
Most biclustering algorithms for microarrays data analysis focus on positive correlations of genes. However, recent studies demonstrate that groups of biologically significant genes can show negative correlations as well. So, discovering negatively correlated patterns from microarrays data represents a real need. In this paper, we propose a Memetic Biclustering Algorithm (MBA) which is able to ...
متن کاملMethods to Bicluster Validation and Comparison in Microarray Data
There are lots of validation indexes and techniques to study clustering results. Biclustering algorithms have been applied in Systems Biology, principally in DNA Microarray analysis, for the last years, with great success. Nowadays, there is a big set of biclustering algorithms each one based in different concepts, but there are few intercomparisons that measure their performance. We review and...
متن کاملبه کارگیری خوشهبندی دوبعدی با روش «زیرماتریسهای با میانگین- درایههای بزرگ» در دادههای بیان ژنی حاصل از ریزآرایههای DNA
Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 48 شماره
صفحات -
تاریخ انتشار 2005